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Abstract 

Here we present the first genome wide statistical test for recessive selection. This 
test uses explicitly non-equilibrium demographic differences between populations to 
infer the mode of selection. By analyzing the transient response to a population 
bottleneck and subsequent re-expansion, we qualitatively distinguish between alleles 
under additive and recessive selection. We analyze the response of the average number 
of deleterious mutations per haploid individual and describe time dependence of this 
quantity. We introduce a statistic, Br, to compare the number of mutations in different 
populations and detail its functional dependence on the strength of selection and the 
intensity of the population bottleneck. This test can be used to detect the predominant 
mode of selection on the genome wide or regional level, as well as among a sufficiently 
large set of medically or functionally relevant alleles. 
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1 Introduction 

In diploid organisms, selection on an allele, or a group of alleles, can be categorized as 
additive, dominant or recessive, or as part of a more general epistatic network. A large 
body of existing work is devoted to statistical methods to detect and quantify selection 
using DNA sequencing data, including comparative genomics and the sequencing of 
population samples [1, 2, 3]. However, much less progress has been made toward iden- 
tifying the predominant mode of selection as additive, recessive or dominant. Genetics 
of model organisms and of human disease provide plenty of anecdotal evidence in favor 
of the importance of dominance [4]. Although genome-wide association studies suggest 
that alleles of small effects involved in human complex traits frequently act additively, 
estimation of genetic variance components from large pedigrees suggests a substantial 
role for dominance in a number of human quantitative traits [5]. Alleles of large effects 
involved in human Mendelian diseases, spontaneous and induced mutations in model 
organisms, such as mouse, zebrafish, or Drosophila, are frequently recessive [6]. In spite 
of these observations, the role of dominance in population genetic variation and evolution 
remains unexplored and no formal statistical framework to test for dominance coefficient 
is currently available. 

Using a combination of theoretical analysis and computer simulations, we demon- 
strate that recessive selection can be qualitatively distinguished from additive selection 
in populations that experienced a population bottleneck and subsequent re-expansion. 
Previous studies of non-additive variation in the presence of a bottleneck lack a complete 
description of the dynamics after re-expansion [7, 8, 9, 11, 3], or focus on epistatic interac- 
tions rather than recessive selection [12, 13, 14, 15, 16, 17], with the notable exception of a 
recent independently conducted complementary analysis found in [18]. Contrary to naive 
expectation, the number of deleterious recessive alleles per haploid genome is transiently 
reduced after a population bottleneck, while the number of additively or dominantly 
acting alleles is increased. In spite of a well-documented increase in frequency of some 
recessively acting variants in founder populations, the average number of recessive alleles 
carried by an individual is reduced as a consequence of the bottleneck. With the growing 
availability of DNA sequencing data in multiple populations, these results demonstrate 
the potential to directly evaluate the role of dominance, either on a whole genome level, 
or in specific categories of genes. 

Population bottlenecks are a common feature in the history of many human popula- 
tions. For example, the "Out of Africa" bottleneck involved ancestors of many present-day 
human populations. Numerous recent bottlenecks affected, among others/well studied 
populations of Finland and Iceland. More generally, bottlenecks followed by expansions 
are standard features in the recent evolution of most domesticated organisms. We suggest 
that complex demographic history may assist rather than complicate statistical inference 
of selection in population genetics. Here we use the distinct demographic histories of 
two subpopulations to identify the type of selection dominating the dynamics, and show 
that the average number of mutations per individual, (x), is dependent on the mode of 
selection. We introduce a measure B R (the "burden ratio" defined below) that provides a 
simple statistical test for any set of polymorphic alleles in the population, where B R < 1 
corresponds to predominantly additive selection and Br > 1 to predominantly recessive 
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selection, as shown in Figure 1. This test is not restricted to the simplified demographic 
model presented in this paper, but rather provides a quite generic qualitative test for the 
predominance of recessive selection in comparison between two populations, one of which 
experienced a bottleneck event. 



We work with a simple demography described by an ancestral population of No indi- 
viduals that splits into two subpopulations, one with population size N 0 equal to the 
initial population size ("equilibrium"), and one with reduced bottleneck population size 
Nb ("founded"). The latter population persists at this size for T B generations before instan- 
taneously re-expanding to the initial population size No, as shown in Figure 1. Time t is 
measured after the re-expansion from the bottleneck, as we are interested in the dynamics 
during this period. Quantities measured in the equilibrium population, and equivalently 
prior to the split, are denoted with a subscript "q". We consider only deleterious mutations 
with average selective effect of magnitude s > 0, such that s represents the strength of dele- 
terious selection. Extensions of this analysis to a full distribution of selective effects can be 
found in the SI. The initial population is in steady state with INqUh deleterious alleles in- 
troduced into the population at a mutation rate Ud per haploid individual per generation. 
In a steady state equilibrium, the site frequency spectrum (SFS) of polymorphic alleles is 
given by Kimura [19]. 



Here h > 0 is the dominance coefficient for deleterious mutations, where h = 1/2 corre- 
sponds to a purely additive set of alleles, and h = 0 corresponds to the purely recessive 
case. For the present analysis, we primarily focus on these two limits, contrasting their 
effects on the genetic diversity. The solution represents a mutation-selection-drift balance 
in which new mutations are exactly compensated for by the purging of currently poly- 
morphic alleles due to selection and extinction due to stochastic drift. In this way, an 
approximately static number of polymorphic alleles exists in the population at any given 
time. 



We follow the expected number of mutations per chromosome in the population, which is 
simply the first moment of the SFS. 



When multiplied by s, this is the effective "mutation load" of each individual in the additive 
case, but in the case of purely recessive selection this is not proportional to the fitness, as 
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A) Simple Demographic Comparison 



time 



No individuals 



ANCESTRAL POPULATION 



No individuals 



CONSTANT SIZE POPULATION 



Nb individuals 



Tb generations 



No individuals 



BOTTLENECKED POPULATION 



B) Additive Response to Bottleneck 
Br- 



bottleneck 



Br<1 



C) Recessive Response to Bottleneck 



Br 



Br>1 




bottleneck 



Figure 1: A schematic representation of two populations is presented above (A). Initially 
a single population prior to the bottleneck event, the populations split and have distinct 
demographic profiles. The equilibrium population has constant size for easy comparison 
to the founded population. The latter drastically reduces its population size to Nb for a 
short time T B during the founder's event. Our statistical comparison between populations 
B R is represented here for cases of purely additive (B) and purely recessive (C) variation. 
The statistic B R > 1 for recessive variation (dominance coefficient h = 0) and B R < 1 for 
additive variation (h = 1 /2), providing a simple test for the primary mode of selection of 
polymorphic alleles in the populations. 
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selection acts only on homozygotes. We refer to this statistic generally as the "mutation 
burden" to avoid assumption of any given mode of selection. Comparison between the 
mutation burden in the equilibrium and founded populations in the form of the "burden 
ratio", B R , provides a test for recessive alleles. 

< 1 for additive selection 

(3) 

> 1 for recessive selection 

To gain intuition for this qualitative difference, we work to quantitatively understand the 
population dynamics in a simple demography, first for purely additive selection, and then 
for purely recessive selection for comparison. 



Di? = 



ix") founded 



3.1 Additive selection and response to a bottleneck 

The initial site frequency spectrum (p^(x) for purely additive alleles is given by Equation 
(1) with h = 1/2. 

fl n 1 _ p 2N 0 s(l-x) 

*&> = #-x) l-e°» (4) 

Here 0 O = 4N 0 LTd. When 2N 0 s » 1, the SFS rapidly decays as x — > 1 simplifying the 
functional form[20]. We approximately compute the initial mutation burden as follows. 

/ s a f 1 e ~ 2N ° SX 1Ud ™ 
<x)o ado I x « (5) 



Now we deviate from equilibrium by reducing the population size to 2Ng chromosomes, 
representing a population bottleneck. The effect that a bottleneck has on the site frequency 
spectrum is twofold: a fraction of alleles are removed from the population due to increased 
random drift, and the mean of the remaining alleles occurs at higher frequency. The 
dynamics of the distribution <p(x, t) during such a change in demography can be computed 
from Kolmogorov's forward equation, as detailed in the SI. The first moment of the 
distribution, the mutation burden, follows the temporal dynamics derived from summing 
the Kolmogorov equation over all alleles in the genome, and takes the following form. 



d t (x) *U d -- ((x) - (x 2 )) (6) 

The burden of additive mutations is not directly affected by drift, as the drift term vanishes 
from the dynamics of the first moment, however the dependence on the second moment 
introduces an indirect dependence on drift. In the strong selection regime, in the limit 
where (x 2 ) «: (x), extinction of some alleles is exactly compensated for by an increase 
in frequency of other alleles. This is true in the equilibrium distribution prior to the 
bottleneck when N 0 s » 1, where (x) Q ~ 0(Ud/s) and (x 2 ) ~ O(Ud/(N 0 s 2 )). During the 
bottleneck, the mutation burden (x) monotonically increases; the second moment (x 2 ) 
increases, as well, reaching a maximum value in the case of a long bottleneck where it 
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scales as (x 2 ) ~ 0(Uci/(NbS 2 )). Provided N B s » 1, the second moment is guaranteed to be 
subdominant to the first moment, simplifying the dynamics as follows. 

d t (x) *U d - $ -(x) (7) 

For a bottleneck of duration T B , this equation admits solutions of the form, 



, /T , s> , . - s Ib 2U d _*tb 2U d 

(x(T B )) * (x) 0 e 2 + (l - e 2) = . (8) 

s s 

After plugging in the initial value (x(0)) = (x)o = 2U d /s, we find that the time dependence 
drops out completely, demonstrating that the population remains in mutations selection 
balance throughout the bottleneck. After instantaneous re-expansion to the initial popu- 
lation size, the dynamics of the distribution cp(x) are completely analogous to those inside 
the bottleneck in this limit, such that the mutation burden never deviates during the 
demographic perturbation. 

In the opposite limit of completely relaxed selection during the bottleneck, the dynamics 
of the mutation burden are completely driven by the influx of new mutations. 

d t (x) = U d (9) 

The net effect of this accumulation over the course of the bottleneck is simply the integral 
of this quantity. For a bottleneck with duration T B generations, the net effect of mutation 
accumulation due to relaxed selection is given simply by the following expression. 

<x(T B )> * (x) 0 + U d T B (10) 

Additionally, one can show that the second non-central moment gains an analogous con- 
tribution in addition to the net effect of drift. 

(x 2 (T B ))*(x 2 )o + lB(x)o + U d I B (11) 

Here we have expressed the second moment as a function of the bottleneck intensity 
I B = Immediately after re-expansion from the bottleneck, selection is again efficient, 
such that the dynamics are completely described by Equation (6). Although the second 
moment is increased due to relaxed selection during the bottleneck, we find that this 
increase is negligible in comparison to the direct accumulation in the first moment provided 
I B «: 1. As a result, the primary effect of the bottleneck in this limit is to accrue new 
mutations that are subsequently purged when selection is again efficient in the re-expanded 
population. The dynamics for the two limiting cases can be summarized as follows. 



(x(t)) fl 



ounded 



' ^ for 2N B s » 1 

2 ^ + U d T B e- s i for 2N B s « 1, I B « 1 



We note that (x) founded ^ {x)eq at all times in both limiting cases, and asymptotically decays 
to the equilibrium frequency on a timescale given by the strength of selection of the 
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accumulated deleterious mutations. In the case of an instantaneous bottleneck, we find 
that the mutation burden is only slightly shifted even if selection is fully relaxed, resulting 
in effectively no observable change in either limit. Our statistical measure, the burden 
ratio B R , in the additive case can be written approximately as follows. 

' 1 for 2N B s » 1 

(l + f^)" 1 > 1 for 2N B s «: 1, I B <zl 

As we will see in the following sections, recessive selection results in depleted mutation 
burden with corresponding values B R > 1, proving a contrast to the additive scenario and 
justifying our use of this statistic as a test for recessivity. 



g additive ^ _ \%/eq 



(.^founded 



3.2 Recessive selection and dynamics of the mutation burden 



Prior to the bottleneck, the initial site frequency spectrum for alleles under recessive 
selection is given by the h = 0 limit of Equation (1). 



<$(x) = 0c 



-2N 0 sx 2 



X (1 - x) 



1 - 



,2N 0 sy 2 



So d y 



,2N 0 sy 2 



(12) 



At low frequencies x < s/ANoS the spectrum decays slower than in the additive case, 
representing alleles protected from recessive selection by existing primarily in heterozy- 
gous form. In contrast, at high frequencies the spectrum decays faster than the additive 
exponential decay, falling off as e~ 2N ° sx2 . 



3.2.1 Instantaneous population bottlenecks 

First, we restrict our analysis to an instantaneous bottleneck with intensity I B = 1/2Nb, 
as this provides insight into the non-equilibrium response of the frequency spectrum to a 
downsampling event. Later, we extend our analysis to finite bottlenecks that persist for 
T B generations, with total intensity I B = T B /2N B . We represent the increase in drift due 
to a single generation bottleneck by downsampling. During this time step, N B diploid 
individuals are chosen at random from the initial larger population of No individuals. 

t B = 0) = j jdy (1 - y) 2NB " ,c (y)Vo(y) (13) 

Binomial sampling gives the distribution <p B of deleterious alleles with frequency x = 
k/2N B . There is a loss of allelic variation due to the bottleneck, corresponding to the k = 0 
term in Equation (13). 

Re-expansion is modeled as up-sampling the distribution <p B (x) from N B to No diploid 
individuals, which has negligible effect on the first and second moments of the distribution. 
As a result of drift to higher frequencies during the bottleneck, much of the existing 
variation appears in homozygous form immediately after the increase in population size. 
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These individuals are rapidly selected out of the population, driving high frequency alleles 
to lower frequencies on a very short time scale. Since drift is once again suppressed, 
selection becomes far more efficient, particularly for alleles of large selective effect. 

The time evolution of <p after the bottleneck is given by the forward Kolmogorov 
equation for recessive selection (see SI). The mutation burden follows the time dependence, 

d t (x(t))*U d -s(x(t) 2 ). (14) 



Here we suppress a selection term proportional to (x 3 ) of 0(1/ yNs) in analogy to the 
additive case. Since recessive selection depends quadratically, rather than linearly, on 
the allele frequency, the increased variance of the distribution drives the motion of the 
mutation burden. Alleles with frequency x > ^l/2No appear in homozygous form and 
are rapidly pushed down to lower frequencies. This happens on a time scale of order 
s" 1 / 2 and effectively reduces the variance, slowing the decrease in the mutation burden 
(x). New mutations introduced during this period slowly drift to appreciable frequencies, 
replacing those lost in the bottleneck. This process is drift controlled, rather than selection 
controlled, and thus occurs on a time scale of 0(2No) generations. As a result, the mutation 
burden quickly decreases due to selection immediately after the bottleneck until it slows 
to a stop, and then gradually increases as the population accumulates new mutations and 
re-equilibrates. 

A minimum in the mutation burden (x(t)) founded occurs when the time derivative van- 
ishes. This corresponds to a characteristic time scale associated with the selective effect s, 

where our statistical test Br = , ,, - is maximized. Since this time scale is shorter than 

\X/ founded 

the time scale of drift, we can imagine rescaling time by the effective population size 2Nq 
and then working in the perturbative regime t/2No 1. This allows us to Taylor expand 
near the re-expansion time t = 0 to understand the motion of the mutation burden at times 
soon after the bottleneck. 



d t (x(t)) xU d -s 



(x(0 2 )| t =o + td t {x(t) 2 )\ t=0 + p 2 (x(t) 2 )\t=o + 0(t 3 ) 



(15) 



To understand the time dependence of (x 2 ), specifically the time derivative, we analyze 
the higher moments in the same fashion as employed for the first moment in Equation 
(14). All relevant moments are computed in the SI and we note sufficient convergence 
to validate this expansion. This allows for the re-expression of Equation (15) to second 
order in t in terms of the first three moments of the site frequency spectrum immediately 
after re-expansion. The moments of the post-bottleneck initial distribution can be written 
in terms of the initial equilibrium distribution using the integral form given in Equation 
(13). Details of this calculation appear in the SI. In the strong selection limit 2NoS » 1 
these initial equilibrium moments are readily approximated by standard convolutions 
of a polynomial with a Gaussian. Suppressing subdominant contributions in the limit 
N 2 , » NoS, we find the following approximation to the trajectory of the mutation burden 
immediately after the bottleneck re-expands. 



V5V _sM , 3sf2 
V 2N B ! + Ud 2N E 



<*(*)> ~ U d ^-^ 1 - — + U d — + 0(t 3 ) (16) 
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Concentrating on this second order expansion in t, we find that the curve first drops from 

its initial value (x(0)) = Ud -J^/ quickly reaches a minimum, and is then brought back 
up by the the positive second order term. The location of the minimum is easily found to 
have the following parameter dependence. 



tmin °C W (17) 

The second derivative is positive at this extremum, implying a local minimum. Plugging 
t min into our expression for (x(t)) in the limit NqS » 1, we find the following minimum 
value for the average number of recessive deleterious mutations per genome following a 
bottleneck. 

"-""fe-zik) (18) 

We note that (x) 0 ~ JiL- is the approximate mutation burden for the equilibrium distribu- 
tion in the 2NoS » 1 limit, allowing us to simply write the extreme value of the Br statistic 
as follows. 



Mtmin) >1 (19) 

We find the following dependence on time in immediate response to a population bottle- 
neck. 

/ st 3s 3/2 f 2 \ _1 

jg recessive (j. j „ j _ « + + Qtfs > j (2Q) 

\ 2N B 2N B V4No / 
This expansion is only valid in the small time limit where the quadratic term is subdom- 
inant, such that all values are positive. Long before this simple quadratic expression 
becomes negative, higher order contributions become relevant and dominate. As seen 
in simulations described in the following section, for recessive deleterious mutations, the 
burden ratio remains positive at all times. 

This precise result applies strictly in the limit of a strong, single generation bottleneck, 
where N 0 » N B . Additionally, the technique used to compute integral expressions re- 
quired the strong selection limit 2NoS » 1. Analysis of higher order contributions to the 
trajectory are made substantially easier by restricting to the limit 2N B > V2N 0 s, which 
happens to be biologically reasonable, for example, in human populations where most 
examples of founding events are on the order of No ~ 10 4 and N B ~ 10 3 (see further dis- 
cussion in the SI on general dominance coefficients). Despite these analytic restrictions in 
parameter space, our simulations described below indicate that the signature of B R > 1 is 
ubiquitous for populations under predominantly recessive selection. 

3.2.2 Extended population bottlenecks 

We argue that for the case of relatively low intensity bottlenecks, where intensity is defined 
as I B = T B /2N B «. 1, we can approximately express the magnitude of B R using a simple 
substitution (2N B ) _1 — > I B . This is equivalent to the claim that for low intensity bottle- 
necks, the B R statistic depends only on the ratio of the bottleneck time to the bottleneck 
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0 1000 2000 3000 4000 5000 

time after re-expansion (t obs ) 



Figure 2: The time dependence of B R (t) after a population bottleneck is shown for various 
selective coefficients. Peak Br values vary in both magnitude and time as a function of s. 
The founded population was simulated with 2Nn = 20000, 2N B = 2000, and T B = 200 and 
plotted for 5000 generations after re-expansion. 

population size, and any explicit dependence on T B occurs in subdominant contributions. 
This intuition is confirmed by simulations described in below, where we show that the 
accuracy of our analytic approximation breaks down as I B — > 1 and the intensity becomes 
non-perturbative. For short bottlenecks with I B < 1/10, the approximation of an instan- 
taneous single generation sampling event remains sufficiently accurate, even for strong 
selective coefficients s ~ 0.1. Under this trivially extended instantaneous approximation, 
B R (t) can be written in terms of the intensity of a short bottleneck as follows. 

B R extended (t) ~(l-I B |s£ - + 0(f 3 )JJ > 1 (21) 

The Br of maximum effect, has a magnitude given approximately by, 

B R extenAe \t min )~[l-h^^ . (22) 

For illustration of the behavior described in the above analytics we present a time 
series of recessive simulations with curves representing various selection coefficients in 
Figure 2. The time dependence of the B R statistic is plotted to demonstrate the simulated 
population's response to a founder's event. Crucially, we find that the peak B R values 
vary in both magnitude and time as a function of s, as is consistent with our analytic 
understanding and intuition. 
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3.3 Transient response and time of observation determine detectable 
selection coefficients 

Thus far, we have detailed the dynamic dependence of a set of alleles in a population, all 
with selective effect s, in response to demographic perturbation in the form of a bottleneck. 
Notably, for recessive selection, a peak response occurs in the Br statistic at some time f mz „ 
after re-expansion. In general, both the magnitude of B R (t min ) and the time of the peak 
itself depend sensitively on the selection coefficient. In general, a distribution of mutations 
with different selective effects will be present, many of which may be simultaneously 
polymorphic in a given population. Since alleles of different selective effect respond to 
the bottleneck on different time scales, one can ask what selective effect is most likely to 
be observed at a given time. For example, very strong selection has the tendency to peak 
and subsequently re-equilibrate immediately after the bottleneck, such that observation of 
alleles with large s is substantially more difficult at later times. On the other hand, alleles 
under relatively weak selection have a peak effect at very late times, such that at the time 
of data collection a statistically significant response may not yet have occurred. 

We would like to understand the transient behavior of the burden ratio B R (t), as well 
as the value of the selection coefficient s for which B R is largest at a given time. When 
comparing to population data, one has little control over the demographic history, and 
thus it becomes important to understand the selective coefficient that dominates at the time 
of observation. According to the time dependent expression in Equation (21), we expect 
the effect to decrease quite rapidly for very large s. However, the peak occurs quite early 
in the case of larger s values, allowing the mutation burden to equilibrate over a longer 
period of time between the peak and observation to return to mutation burden values close 
to Br ~ 1. This tells us that the equilibration process is what reduces the magnitude of B R 
for large s. In the case of very recent bottlenecks, the large s values dominate, but for later 
times of observation, this signal has partially equilibrating, potentially allowing a smaller 
s value to dominate the statistic. At a given time of observation t obs , one can represent 
Br(s, t obs ) as a function of various selection coefficients s. Figure 3 represents B R (s) for a 
fixed t 0 bs for various dominance coefficients h. We concentrate here on recessive variation 
with h = 0, but note that a crossover occurs at some value h c where additive and recessive 
effects offset each other in the B R statistic (detailed in SI). Based on our analytics, we expect 
the peak to shift from extreme high s values at early times to extreme low s values at late 
times, eventually dissolving into neutrality. We take the s derivative of Equation (21) to 
find the maximum at t obs . 

d s B R (s, t obs )\ s=Sma oc -I B t obs + — y- = 0 (23) 

2V4No 

^16^^ 

max m\ m\ { ] 

obs obs 

One can easily show that the second derivative evaluated at this point is negative, con- 
firming that this is a maximum. This result matches our intuition: maximum s values of 
Br(s, f) are found at high s for early times, s max (t — > 0) » 1, and at low s for late times, 
Smax(t -> t») « 1. This is qualitatively observed in our simulations by comparing the 
relative values of B R (s) as a function of time. 
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7!o Ol 001 0~001 0.0001 0.00001 0 

selection coefficient (s) 



Figure 3: At the time of observation t 0 \, s , the value of B R (t 0 \, s ) is determined by the average 
strength of selection s for additive or recessive variation, or variation with any intermediate 
dominance coefficient h e [0, 1]. A range of B R values observed at a single time slice are 
plotted for various s values. Different dominance coefficients appear as solid lines with 
fully recessive selection (h = 0) at the top and purely additive selection (h = |) at the 
bottom. Br approaches one both in the limit of very strong selection s — > 1 due to the 
rapid transient response, and in the very weak selection limit s — > 0 due to the nearly 
neutral insensitivity to the bottleneck. For some intermediate dominance coefficients h c , a 
crossover occurs (h c ~ | in the example shown) where the effects of additive and recessive 
variation cancel such that B R (h c ) ~ 1. The parameter dependence of the crossover is 
explored analytically in the SI. 
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As the effect is transient, we can define a relaxation time t re \ ax corresponding to the 
vanishing of any response to the bottleneck. This is given by determining when s max is 
dominated by effectively neutral variation at roughly s max ~ l/2No. After this time, B R (s, t) 
cannot be differentiated from one for any s. 

IN 

t relax <—A<2N 0 (25) 

VTo 

We note that the return to equilibrium happens on a time scale faster than random drift, 
even for the weakest selective effects, thus validating our perturbative approximations 
using f/2No <s 1. Higher order time dependence in Equation (21) may substantially 
correct this estimate, but we feel that the presentation of this methodology is conceptually 
important and provides a greater understanding of the transient dynamics of population 
response to bottlenecks. As it is relevant to human populations, we note that if both 
populations expand exponentially after the bottleneck, the effect may persist long beyond 
trelax- This is explored analytically in the SI and in simulations in an accompanying paper 
[21]. 



4 Comparison of analytic results to simulation 

We checked our analytic results using a forward time population simulator, described in 
detail in the SI. Given the ubiquity and analytic simplicity of the exponential decay in the 
additive scenario, we focus here on our predictions for recessive variation. We compare 
analytic expressions of B R (t min ) at the peak response given in Equation (22) for various 
selection coefficients. We simulated a wide range of bottleneck parameters to test the 
limitations of our theoretical understanding. In Figure 4, we demonstrate the accuracy 
of our analytic results, by plotting the ratio of the simulated values of B R (t max ,s, I B ) to our 
analytic predictions B R (t max ,s,I B ) as presented in Equation (22). We arrange our simulated 
data by bottleneck intensity I B , as we expect the instantaneous bottleneck approximation to 
break down as intensity is increased due to longer bottleneck duration Tg » 1. As plotted, 
complete agreement between simulated data and analytic predictions is represented by 
a flat line at B R sim IB R amlytk = 1. As expected, we find deviations as we approach the 
limitations of our perturbative approximation roughly around T& ~ 2N B /10 when I B ~ 0.1. 
Below these higher intensities, we find quite good agreement for all parameter sets well 
below 10% error, even at I B = 0.05. 



5 Discussion 

The increase in prevalence of recessive phenotypes following population bottlenecks has 
been attracting the interest of geneticists for a long time [7, 22]. Theoretical analysis of allele 
frequency dynamics in a population expanding after a bottleneck suggested that frequency 
of an individual allele may rise due to increased drift [22, 23, 24]. Here, we focus on a more 
general question of the collective dynamics of recessively acting genetic variation. Sur- 
prisingly, our analysis suggests that the number of recessively acting variants per haploid 
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Br (tqnin) 




Figure 4: Maximum response values of the burden ratio B R (t min ) are plotted for recessive 
selection as a function of bottleneck intensity. A wide range of parameter sets are plotted 
with all combinations of 2N B = {2000,1000,400,200,100}, s = {0.1,0.02,0.01,0.001}, T B = 
{200,100,50,20,10} , each simulated for 10 8 nucleotide sites. For relatively low intensity 
bottlenecks we note excellent agreement over the parameter ranges plotted. Intensities 
with I B = T b /2Nb > 0.1 are excluded, as the instantaneous bottleneck scaling breaks 
down in favor of a long bottleneck scaling. The approximation necessarily weakens for 
simulations that represent longer bottlenecks, and only for strong selective coefficients, as 
expected. This quantifies the limitations of the instantaneous bottleneck approximation, as 
we observe substantial deviation only around I B = 0.1 and with selection strength s = 0.1. 
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genome is reduced in response to a bottleneck and subsequent re-expansion. Generally, 
we have demonstrated that the frequency spectra of recessive deleterious polymorphisms 
behave distinctly from additively acting variation following a population bottleneck and 
subsequent re-expansion. The response of additive variation depends crucially on the av- 
erage number of deleterious alleles, and on the number of generations for which selection 
is relaxed during the bottleneck. In contrast, the dynamics of recessive variation crucially 
depend on the width of the site frequency spectrum, rather than the average number of 
mutations per individual, such that the accumulation of deleterious mutations can respond 
strongly even to a single generation bottleneck. Importantly, the temporal dynamics of 
the accumulation of deleterious alleles depends qualitatively on dominance coefficient 
and quantitatively on selection coefficient. The qualitative dependence on dominance 
coefficient allows for a robust statistical test for recessivity. If the variation is additive, 
the number of deleterious variants per a haploid genome is larger in a bottlenecked pop- 
ulation than in a corresponding equilibrium population. If the variation acts recessively, 
this number is smaller. The selection coefficient determines the timing of response to a 
bottleneck. 

By explicitly analyzing the non-equilibrium response to a bottleneck, we have demon- 
strated a technique for using potentially confounding demographic features to probe the 
underlying population genetic forces. In realistic populations, for example in modern hu- 
mans, substantial work has been done to identify and understand the recent demographic 
history of geographically disparate populations [25, 26, 27, 28, 29, 30, 31, 32, 33, 34]. In the 
case of the "Out of Africa" event, a historically substantiated and believable demographic 
model can be used to model the difference between African and European populations 
since their divergence. Comparison between populations that have and have not under- 
gone a bottleneck can be used to infer plausible selection and dominance coefficients. In 
an accompanying paper [21], we specialize this analysis using a realistic demographic 
model to attempt to bound the selection and dominance coefficients in modern human 
populations. Parameterizing only by the duration of the bottleneck Tg, along with s and 
h, one can show that a substantial fraction of this three dimensional space is disallowed 
by the observation of even a single bottleneck. 

Although the net number of recessive deleterious mutations is reduced as a conse- 
quence of a founder's event and subsequent re-expansion, the fitness of individuals carry- 
ing these alleles is not increased, but rather decreased; selection acts only at homozygous 
sites and the number of homozygotes is known to increase after a population bottleneck. 
However, the number of heterozygous deleterious sites, or the average carrier frequency 
for associated alleles, is suppressed, such that the mating of individuals from disparate 
bottlenecked populations may result in a decreased incidence of recessive phenotypes in 
such mixed lineages. In studies of model organisms, this may have applications when 
comparing laboratory populations founded from a few wild type individuals to their 
corresponding natural population. 

In principle, the results of this study are applicable to the analysis of specific groups of 
genes and pathways. Sufficiently large subsets of alleles that are medically relevant may 
be analyzed in humans to identify the mode of selection for candidate variants of recessive 
diseases. For model organisms with a significant density of deleterious alleles, it may be 
possible to create a dominance map of the genome. 
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In sum, the non-equilibrium dynamics induced by demographic events is an essen- 
tial, and indeed insightful, feature of most realistic populations. Population bottlenecks, 
abundant in laboratory populations and in natural species, have the potential to provide 
a novel perspective on the role of dominance in genetic variation. 
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Supplemental Information 

Appendix I: Integration of the Kolmogorov Equation 

The Forward Kolmogorov Equation is generally used to describe the probability distribu- 
tion of the trajectory of an allele in frequency space within a population [19, 35]. If one 
analyzes the frequency of all polymorphic alleles in the population, or the site frequency 
spectrum (SFS), one can describe the collective dynamics of this distribution in a very 
similar form using an infinite sites model. The Kolmogorov equation describes the time 
dependence of the probability density p(x, t) of an allele's frequency x at some time t. In 
the limit of a large number of simultaneously polymorphic alleles, one can think of all 
points in this probability distribution as being filled by one or more alleles. Choosing to 
focus on the case of purely recessive variation, one can write down the time dependence 
of the SFS in the following form. 



d t (p(t,x) = sd x (x 2 (l - x)(p) + ^ 2 x (x(l - x)(p) + 2NU d 5 



1 

x 

2N\ 



(26) 



The presence of the delta function represents an influx of new mutations into the spectrum 
at initial frequency 1/2N coming from 2N individuals in the population, each with a 
mutation rate U d per individual per generation. This acts like a source in the SFS at 
x = 1/2N, and is a reasonable approximation in the limit of a long genome with no 
double mutations or back mutations. We are interested in the time dependence of specific 
moments of this distribution. For example, to determine the time dependence of the first 
moment of the distribution (x), we multiply by x and integrate to find the time dependence 
of this moment. 



d t {x) = ^ dxx sd x (x 2 (l - x)(p} + dxx -^d 2 x {x{l - x)(p) + dx x 2NoUd 5 



x - 



2N 0 



(27) 



The delta function integral is trivially computed and we integrate by parts once on each of 
the other integrals, noting that the boundary terms vanish due to the x 2 (l - x)cp(x) factor 
under the derivatives, which scales as x at low frequencies x — > 0 and decays rapidly as 
x — > 1 provided selection is efficient. 

d t (x) = - J dxs (x 2 (l - x)(p) - J dx^-jd x (x(l - x)(p) + U d (28) 

The drift term can be identified as a total derivative, which vanishes, leaving the following 
dynamical equation for the mutation burden. 

d t (x(t)Y ecesswe = U d -s ((x(t) 2 ) - (x(t) 3 )) (29) 

The time dependence of all higher moments can be computed in a completely analogous 
way. Since it is relevant for our present purposes, we note the equation of motion for the 
second non-central moment. 

d t (x 2 (t)Y~ = ijl - 2s ((x(tf) - (x(0 4 )) + ^ (<*(0> - (x 2 (t))) (30) 



19 



Downloaded from http://biorxiv.org/on September 18, 2014 



Equations of motion for moments of the site frequency spectrum of alleles under 
purely additive selection can be computed in the same way Here we cite these results for 
convenience. Note that we are using the convention s add = hs = s/2 

d t (x(t)T mive *U d - S - (<x(f)> - <x 2 (f)>) (31) 

dt{x 2 {t) yMH«,e *^L-s (<X 2 (f)> - <X 3 (f)>) + ^ (<*(*)> " <^(0>) (32) 

In the limit V2Ns » 1, the SFS for recessive alleles rapidly vanishes at high frequencies 
such that we can drop the (1 - x) dependence to find the following approximate equation 
of motion. 

^t))^ ~ U d - s{x{tf) for y/lN~s » 1 (33) 

The (1 — x) contribution in the dynamics of higher moments can be similarly neglected. For 
alleles under additive selection, the analogous strong selection limit is given by2Ns » 1, 
which results in the following simplified dynamics. 

^Wfi * U d - $ -(x(t)) (34) 

Notably this equation of motion is diagonal and can be easily solved analytically as is the 
case for all higher moments of the SFS for alleles under strong additive selection. 



Appendix II: Analytic calculation of the trajectory of the mutation burden 
for recessive selection 

Here we are interested in the motion of the first moment (x(t)) of the distribution <p(t) 
after re-expansion from the bottleneck. First, we consider the equation of motion given by 
Equation (14), which is derived above. We repeat it here for the convenience of the reader. 

d t (x(t))*U d -s(x(t) 2 ) (35) 

Since the time scale on which the mutation burden rises to a maximum is shorter than the 
time scale of drift, we can imagine rescaling time by the effective population size 2No and 
then working in the perturbative regime t «: 1. This allows us to Taylor expand near t = 0 
to understand the motion of the burden at early times immediately after the bottleneck. We 
later determine all of the moments used below and see sufficient subsequent suppression 
to validate this expansion. 

a 



d t (x(t)) « U d - s 



(x(0) 2 ) + td t (x(t) 2 )\ t=Q + !^<x(f) 2 >b=o + 0(t 3 ) 



(36) 



To understand the time dependence of (x 2 ), we analyze the next moment in the same 
fashion as employed for the first moment, as described in the previous appendix and 
given in Equation (30). 

d,{x2) *wr 2s(x>)+ w„ (x) <37) 
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Note that these moments and all higher moments have a non-negligible contribution from 
the diffusion term in the forward equation. 

We model an instantaneous bottleneck as a single generation downsampling of 2N B 
chromosomes out of the original population of 2No chromosomes. We can approximately 
compute (x 2 ), (x 3 ), and higher moments if desired, immediately after bottleneck sampling 
(denoted "after") since we have an integral form for <p B (x) given by appropriately scaling k 
in terms of x in Equation (13). Here, (po represents the initial pre-bottleneck site frequency 
spectrum, and the n th moment of this distribution is represented as (x n )o- 



{X 2 ) after ~ ^ 



\tf L k " ffc j f dx (1 - *) 2Nb "^)^o(*) (38) 



The exchanging the order of the integral and the sum, the sum can be computed directly 
as a function of x corresponding the the second non-central moment of the binomial 
distribution. One can compute (x 3 ) completely analogously. 

(*%ter = L k " j fdX (I- x) 2N °- k (x) k (P 0 (x) (39) 

The first three non-central moments of the binomial distribution are as follows: 

\i\ = 2N B x (40) 
H' 2 = 2N B x(l-x) + (2N B ) 2 x 2 (41) 
[i' 3 = 2N B x(l-x)(l-2x) + 3(4:Nlx 2 (l-x) + 8N 3 B x 3 )-l6Nlx 3 . (42) 

In the limit Ng » 1, the second and third moments are well approximated by the following 
expressions. 

y! 2 * 2N B x + (2N B ) 2 x 2 (43) 
H' 3 * 2N B x + 3(2N B ) 2 x 2 + (2N B ) 3 x 3 (44) 

From this we can directly compute the sum in Equations (38) and (39). 



•fter 



fdxn' 2 (p 0 (x) 

dx {w B +x2 ) (po{x) 



2N L 

For the third moment, we find the following expression. 

(X 3 )after = J dx ^ (f> 0 (x) 



(2N, 

I 

{X) ° + (x 2 )o (45) 



dx{ wj 2+ 2w B +x3 r {x) 

(x) Q , 3(x 2 ) 



+ + <^ 3 )o (46) 



(2N B ) 2 2N B 
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The third moment is relevant in that it allows us to approximately compute the time 
dependence of the second moment immediately after re-expansion. 

dti* 2 ) after ~ 



(47) 

All of the (x m )o moments can be computed from the initial distribution, determining the 
Taylor expanded expression in Equation (36) explicitly These integrals are well approxi- 
mated in the limit NoS » 1, as described in a following appendix. We calculate the integrals 
using this approximation and express the first three moments below, the first two of which 
were described originally in [20]. 

. Ud V4No 

(x)o ~ 



s 

<* 3 >o ~ (48) 

Additionally, we are working under the approximation of a relatively short bottleneck, 
such that (x) a fter ~ {x)q + UdT B ~ (*)o- Corrections can easily be computed to determine 
the T B dependence, if desired. Plugging these in, we can gauge the order of magnitude 
and sign of the initial contributions to the motion of the mutation burden. 

d t {x 2 ) af ter U d I ViNpS 3 1 \ 2 

~ _^£_A (49) 
2N 2 B N B { * V 

Note that the N Q 2 terms exactly cancel in the previous equation and that we have sup- 
pressed C^Nq 1 ) corrections. Putting these results together, we integrate Equation (36) to 
find the following time dependence (x(t)). 

(x(t)) * <x) 0 + U d t - st(x 2 )\ t=0 - s \pj d t (x 2 )\ t=0 + 0(t 3 ) (50) 

Here the integration constant is simply the initial first moment immediately after re- 
expansion (which is well approximated by (x) a f ter = (x) 0 in the case of a strong instanta- 
neous bottleneck). We substitute our computed value from Equations (48) and (49) in the 
above equation to compute the time dependence of the mutation burden (x(t)). 



<*(0> lm (l _^\ + sf(^l +3 \ m (51) 



U d V s V 2N B J 2N B \ 2N B 
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At this point, we generalize to non-instantaneous, but low intensity bottlenecks with the 
substitution — > I B = -j^. By doing this we have matched the bottleneck intensity to 
that of a more extreme, but single generation bottleneck. The time dependence of the 
mutation burden can be approximated as follows. 



(x(t)) ~ (x) 0 II - stI B + st 2 I B \sI B + 3 J^f\\ ( 52 ) 



From this we can easily compute the time dependent form B R (t) 



~ <*(*)>• 

1 



B R (t) ~ M - stI B + st% \sI B + 3 y — j j (53) 

This quadratic time dependence allows us to find extrema. Note that the inclusion of higher 
order contributions allows for a more accurate temporal dependence (x(t)), however this is 
somewhat unnecessary to understand the dominant behavior of the curve. Concentrating 
just on this second order expansion in f, we find that the curve first drops from its initial 

value (x(0)) = Ud r quickly reaches a minimum, and is then brought back up by the the 
positive second order term. The location of the minimum can be found approximately by 
solving the following equation. 

d t (x(t min )) = 0 = -I B V4NoS + hlstmin (l B ^J4N^s + 3) (54) 




tmin ~ t — = r ~ ■= \sh + 3 JttH (55) 

As expected, the second derivative is positive at this extremum, implying a local minimum. 

d 2 t {x{t min )) = 2sI B (l B ^JW^ + 3) > 0 (56) 

Plugging \. m in into our expression for (x(t)), we find the approximate magnitude of the 
mutation burden at this minimum. 



(x(t m j n )) ~ (x)q 



1-4 + 



1 B V4NqS; 



(57) 



We have factored out (x} 0 ~ J^- here since it allows for easier calculation of B R = (x)o/(x) 
below. Thus, in the limit NoS » 1 employed to approximate (x)o, we find the following 
minimum value for the average number of recessive deleterious mutations per genome 
following a bottleneck. 



(x(t min )) ~ 6q 



V4Noi (4V4Noi + 12/Z B ) 



(58) 
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From this expression, we can immediately calculate the peak value for the Br statistic as 
follows. 

R (t \ - {X) ° 



(x(^ nnn )) 

12 



1- 4 + 



4Jg V4NqS + 12 

3I B V4NqS + 12 



(59) 



We note that in the limit Ng » NoS, which is biologically relevant for humans, these 
results simplify as follows. The time dependence of the mutation burden for the founded 
population is given by, 

<%^^ (1 _ s „ B) + 3s ft, (60) 

This can be used to obtain the functional dependence of B R (s, t). 

i i \ -i 



B R (t)~\l-stI B + st 2 I B 3 J— I (61) 



In the limit Ng » N 0 s, the peak response B R (t mi „) occurs at a time, 



f 1 4N ° (M\ 

6 V s 

and takes the following approximate functional form. 

B.ft*) ~ (l " ^^)"' (63) 

We use this expression to compare to simulations in this regime of interest, with the 
understanding that it breaks down at relatively large bottleneck intensities. 



Appendix III: Distribution of selective effects 

For a distribution of s effects, the s of maximum effect on Br is dependent on the time 
since the bottleneck as given in Equation (24). This describes the transient shift of the 
elevated load ratio towards smaller s values. To determine the total B R observed a t the time 
of observation t obs , one must integrate over all s values present in the population. This 
assumes that distinct s classes for recessively acting deleterious alleles can be thought 
to behave independently in a well mixed, freely recombining diploid population. The 
distribution of selective effects for de novo mutations, p(s), provides the appropriate 
weight associated with each class of selective effects, as the mutation rate for mutations of 
selective effect s is given by Udp(s). Assuming a static distribution of selective effects, we 
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can calculate the observed load ratio at t 0 \, s . For a given population, the observed mutation 
burden (x) ohs at the time of observation is the mutation burden for each class of selected 
effects averaged over their representative fraction of new mutations into the population. 

(x(t))° bs = J dsp(s)(x(s,t)) (64) 

This is true for both the equilibrium and founded populations, allowing us to compute 
the observed burden ratio Br o1,s as follows. 

n obs,, v <*>? pSp(s) {X{s,t 0bs )) eq 

Br {tabs) = ——r = -7T- — (pb) 

W founded J ds P( S ) \ X ( S ' tabs)) founded 

The largest contribution to the load ratio at time t 00s occurs at some effective s max , 
denoted by s° bs ax rved . The distinction here is that, although s max may have the largest mutation 
burden, it may occupy only a small fraction of the mutations present in the population 
when weighted by p(s) and thus have a reduced effect on the observed burden ratio B R obs . 
The maximum contribution to the mutation burden s°^ s ax satisfies the following constraint. 

d s (p(s)<j(s, i)) founded) l s * = 0 (66) 

Although we remain agnostic to the distribution of selective effects in the present work, we 
mention that the model of an exponentially decaying distribution p(s) ~ e~ ys is somewhat 
popular in the literature for theoretical, experimental, and aesthetic reasons. As a result, 
the introduction of such a distribution (or more generally any monotonically decaying 
distribution) would produce an s™ fl * value in the range, 

Smax > Sobs > 1/2N 0 . (67) 

The selective effect for which the observed change to the load ratio Bfl(f 0 f, s ) is maximized 
has suppressed signal relative to slightly lower s values. This is due to the effective 
rarity of high s mutations in the population, as they both are introduced at a lower 
rate p(si arge ) < p(s sma n) and are being more efficiently purged from the population due to 
selection. This indicates that the elevated load ratio B R (t obs ) may be most readily observed 
in the data by looking at mutations with low to intermediate selective effects, rather than 
those with highest effect. Additionally, we note that the corrected equilibration time for 
the distribution of effects is given by the time constant associated with s 0 fe- 

Most generally, the mutation burden will be comprised of a combination of alleles with 
varied selective effects and dominance coefficients. Treatment of alleles with intermediate 
dominance coefficients is discussed below. We can generalize our observed burden ratio 
as follows. 

r> obs/ v fdhfdsp(s,h)(x(s,h,t obs )) eij 

Br (tabs) = —-7Z = r „ r . ~ V 68 ) 



founded 

fdhfds p(s,h) <*(s,Mofa)>/. 



ounded 
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Appendix IV: Exponential expansion and more general geometries 

Exponential expansion is a general feature of many natural populations, particularly after 
a founding event, motivating the generalization of our analysis to such cases. In this 
work, we describe the transient behavior of B R (s, t), and the s values that are favored as 
time progresses. As a result, this behavior is extremely sensitive to exponential expansion, 
for example, as opposed to the simple square bottleneck model described above. In the 
most general case, we may have a general time dependence for the population size after the 
bottleneck, which sensitively effects the s values for which the burden ratio Br is largest. 
For an explanatory example, we will model the immediate exponential inflation of the size 
of both the founded and equilibrium populations after re-expansion from the bottleneck. 

N f (t) ~ N 0 e t/a (69) 

We rescale time by the population size t l = t^rr = te~^ a , yielding exponentially slowed 

"inflated" time in the decelerated frame of the fixed population size. In this rescaled frame 
we can analyze the shift of the transient peak of the load ratio (in inflated time) B R I (s I , t 1 ) 
by plugging our new scaled time into Equation (24). 

s' (70) 

Note that this factor of No refers to the initial population size prior to the bottleneck, 
and does not get rescaled due to the inflating population size. Taylor expansion of the 
exponential demonstrates that there is a perturbative crossover at time t ~ 2a. 

1 1 2 2 2 2 3 t \ 

s ~ K (? + 5 + 2? + S? + '") <71) 

When t ~ a, the third term in the expansion, initially the quadratic term of the exponential, 
finally begins to dominate over the second term in the expansion. At this point positive 
powers become technically relevant in the perturbative expansion. This is the transition 
between the initial transient decrease in s max and the exponential freezing out of the rapidly 
decaying large s components of B R . At this time, the maximum of the load ratio is given 

by, ' 

2N 0 e 2 2No 

max ~ 10a 2 ~ a 2 ' K } 

For very rapid inflation, a is small, indicating that the dominant modes in Br still exist at 
high s values, such that s max » 1. For large a » 1, corresponding to slow, even adiabatic, 
expansion, the transient rapidly decays towards smaller s values, such that s max <£ 1. 
Intermediate values are particularly interesting, as the rate of expansion can actually 
compete dynamically with the transient decay. In this case, any intermediate selection 
effect may be frozen in, dominating the signature in the burden ratio B R . 
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Appendix V: The £>d statistic and linearity 

Here we introduce a completely analogous statistic to Br. An equally valid comparison 
between the two populations of interest, Bo measures the difference in the mutation 
burden, rather than the ratio. 



Bo — equilibrium (-^) founded) 



> 0 for additive selection 

(73) 

< 0 for recessive selection 



There are two primary distinctions between the Bo and Br measures. First, the Bo measure 
is linear and thus the magnitude provides an even comparison between additive and 
recessive variation. Since it is a difference rather than a ratio, the space of allowed additive 
values remains in the range B D addltwe e [0, oo], and recessive in the range B D reccsswe e [-oo,0]. 
For comparison, the ratio B R addltlve e [0, 1], and B R recesswe e [1, oo], which is clearly an uneven 
map, except in the form log Br. This linearity allows for simpler analysis of compounded 
statistical errors, and simpler convolution for a distribution of selective effects. 

{B D ) ob5 = J dsp(s)B D (s,t) (74) 

One complication is that the mutation rate remains on overall scale factor, as B D °c U d , 
making the magnitude of the measure quantitively more difficult to interpret in the absence 
of an extremely good estimate of the mutation rate. Since the mutation rate cancels in 
the Br statistic, the magnitude allows for qualitative inference of the dominance and 
selective coefficients unhindered by the imprecision of mutation rate estimates. The time 
dependence of the statistic is given simply as 



Bo ~ t U d h y[m~s + t 2 l B sU d + 3 ) + Q(t 3 ). 



(75) 



N 2N E 

In the limit of interest Ib~ 2 » NoS, this expressions simplifies to the following. 

Bo ~ t U d I B ^Jm~s + t 2 3U d sI B + 0(t 3 ) (76) 

Appendix VI: General dominance coefficient 

The analysis above presumes that deleterious mutations act with a single average selective 
effect, either purely additively or purely recessively. One can extend our analysis to the 
case of partial dominance with a general coefficient 1/2 > h > 0, with extreme values 
corresponding to additivity and recessivity, respectively. We ask at what value of h does 
the crossover from B R < 1 to B R > 1 occur. This crossover at some intermediate value 
h = h c is of practical interest, as our statistic only has sensitivity to detect whether the 
average dominance coefficient of a set of alleles lies above or below this critical value. The 
Kolmogorov equation is easily generalized to include a general dominance coefficient. 

d t( p(x, t) = 2NU d 5 x - — + ^d 2 Ml - x)cp) 

+ shd x {x{\ - x)(p(x, f )) + s(l - 2h)d x (x 2 (l - x)(p(x, t)) (77) 
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As detailed in the appendix above, we can use this equation to describe the dynamics of 
the mutation burden. 

d t (x) *U d - s A {x) - s R {x 2 ) (78) 

Here we have defined Sa = sh and s R = s(l — 2h) for convenience, and taken the strong 
selection limit in the initial and final population, such that both 2NqSa » land y/2NoS R » 1 
are satisfied. In this limit, one can compute the dynamics of the moments after a short 
bottleneck with completely relaxed selection in complete analogy to the recessive case 
described above. The perturbative dynamics immediately after re-expansion from the 
bottleneck are well described by the following Taylor expansion. 

<x(f)> * <z(0)> + t d t (x(t))\t=o + \ d 2 (x(t))\t= 0 + 0(t 3 ) 

* <x(0)> + t(ll d - s A (x) - s R (x 2 )) | f=0 - f - (s A d t (x) + s R d t (x 2 )) | f=0 + 0(t 3 ) (79) 

The crossover value occurs when B R = 1, such that (x(t)) founded = (x)o, providing the 
following time dependent condition. 

«x(0)> - (x) Q ) + t(u d - s A {x) - s R {x 2 )) | t=0 - | (s A d t (x) + s R d t {x 2 )) \ t=0 * 0 (80) 

As described above, this can be expressed in terms of the moments of the initial distribution 
(x u )q. The values of Sa, Sr, and all of the moments of the initial distribution are a function 
of the dominance coefficient h, such that the solution to the above equation provides the 
crossover value h c . Given the exponential dependence of the the initial distribution <po(x) 
on h, this equation is generally transcendental and thus requires a numerical solution. 

Notably, the solution h c (t) is an inherently time dependent quantity. The additive re- 
sponse is largely due to accumulation of mutations due to relaxed selection during the 
bottleneck with subsequent decay after re-expansion. In contrast, the recessive response 
occurs largely after re-expansion due to the purging of newly formed deleterious ho- 
mozygotes. As a result, at very early times the crossover occurs close to pure recessivity 
such that h c (t — > 0) ~ 0, since Br < 1 for even partially additive alleles at this time. The 
purely additive case equilibrates far more quickly than the recessive case (tf dax °c s -1 and 
t R , oc s~ l l 2 ), such that purely additive alleles become distinguishable from all other cases 
with even minor excess selection on homozygotes at late times. After this time, nearly 
additive modes begin to decay, such that there is a breakdown in the definition of h c since 
multiple values satisfy the constraint (x(t)) founded = i x )o- After additive alleles have re- 
equilibrated, partially recessive alleles remain detectable in times f R , > t > t A , . with the 

' r J relax relax' 

strongest signal coming from purely recessive alleles at t > t m i„ oc s/ANo/s. This behavior 
is summarized in Figure (5). 

Appendix VII: Long bottleneck limit 

In the case of a long bottleneck of duration T B ~ 0(2N B ) generations, the bottlenecked 
population has had sufficient time to equilibrate into mutation-selection-drift balance 
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Figure 5: ABOVE: Several values of dominance coefficient h are plotted as a func- 
tion of time after re-expansion from the bottleneck. Additive and recessive alleles are 
distinguishable when observing at early times prior to re-equilibration due to additive 
selection. During the equilibration process, the critical value of the dominance coefficient 
h c (t) at which B R = 1 shifts from near pure recessivity (h c ~ 0) at early times to near ad- 
ditivity at late times (h c = 1/2). After additive re-equilibration, partially recessive alleles 
are still detectable (Br > 1) with purely recessive alleles providing the largest signature 
prior to their eventual equilibration. In this plot 2No = 20000, s = 10" 2 , T B = 100 and 
2N B = 2000 such that I B = 0.05. This qualitative behavior is generic for most parameter 
values in the short, low intensity bottleneck limit I B <c 1, however the time dependence of 
h c depends sensitively on these parameters. BELOW: The crossover dominance coefficient 
h c is plotted as a function of time. At early times h c ~ 0 is close to pure recessivity. After 
re-equilibration of additive alleles, h c ~ 1/2, such that only partially recessive alleles pro- 
vide a signature. Any value B R > 1 provides evidence of alleles under partially recessive 
selection, with the largest contribution coming from purely recessive alleles. 
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with the new population size 2N B . The site frequency spectrum can be written in the same 
form given by Equation (1). In the case of recessive variation, we find the following form 
during the bottleneck. 



<t>(x) = e B 



e -2N B sx 2 

X (1 - x) 



Jo 



,2N B sx l 



(81) 



Here we have defined 0 B = 4:N B Ud. In the limit NbS » 1, this can be written approximately 
as follows. 

e -2N B sx 2 

(P(x) * d B — — (82) 

Immediately after re-expansion from the bottleneck, the first three moments of this dis- 
tribution can be easily calculated using the Gaussian integrals described in an appendix 
below. These can be substituted into the Taylor expanded time dependent form for d t (x(t)) 
in Equation (15) to analyze the dynamics and solve for the functional dependence of the 
B R statistic. 

For analysis of bottlenecks of intermediate length, a full non equilibrium description is 
required, but this can be well approximated by analytically patching the solutions given 
by the instantaneous and long bottleneck limits. 



Appendix VIII: Simulations and curve collapse details 

We performed the following analysis using a forward time population simulator, custom 
written in C. For computational speed, the simulator only keeps track of allele frequencies 
in a freely recombining diploid system, rather than containing full genome information. 
We use an infinite sites model with a mutation rate appropriate for 10 8 bases that represents 
the roughly the 30Mb length of the human coding genome. Allele counts in the current 
generation are sampled based on the frequencies in the previous generation p oMl the 
selection coefficient s, and the dominance coefficient h. We calculate the expected frequency 
Pcurrent in the current generation as: 

_ + s) + Ml - Poid)Q. + s)h) 

Pcurrent - ^ + $) + _ + ^ + ^ _ ^ )2) • 

The simulator has arguments for mutation rate, lid, adding new mutations at a probability 
of Ud per base pair per generation, selection coefficient s, dominance coefficient h, a burn- 
in of 300,000 generations where sampling occurs every 100 generations in sped-up mode, 
a transition to sampling every 1 generation at 1000 generations before time t = 0, and 
number of replicates. 

The code was designed to allow for flexible demographic histories, in order to accu- 
rately represent events such as the "Out of Africa" migratory event in human population 
genetic history. For the purposes of comparison to our analytic results, we ran simula- 
tions for a simple, square bottleneck of varying population sizes for both the equilibrium 
population with size 2No = 2 x 10 4 and bottlenecked populations with temporarily re- 
duced sizes of 2N B = {2000,1000,400,200,100} for a duration of T B = {200,100,50,20,10} 
generations. These simulations were performed under both purely additive (h = 0.5) 
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and purely recessive (h = 0) selection, for a wide range of selection coefficients including 
5 = 11,0.1,0.02,0.01,0.001}. 

Here we extend our analysis of the accuracy of our analytic results by continuing to 
scrutinize the comparison with our simulation. The following should be thought of an 
extension to the analysis described in in the main text. 

To represent how the breakdown of our approximation depends on the selective coeffi- 
cient, we plot a subset of the data labeled by selective effect size s in Figure 6. We note that 
deviations from our analytic scaling occur most substantially at large selective effect as 
the intensity is increased, implying that the correct scaling of a more extended bottleneck 
involves a correction to the s dependence. 

Finally, we compare the dependence of f m! „ on the selective effect and bottleneck in- 
tensity in Figure 7. We find a very rough collapse at low intensity, with relatively quick 
deviation as the intensity is increased. At larger intensities, the curves are again stratified 
by selective effect, with large s = 0.1 deviating the fastest. We note here that the collapse 
appears to occur around F 1 ™ = 3t anaytlc , implying scaling by a constant factor of our re- 
sults. In part, this is due to various rough approximations in the integrals ( yfnfl « 1, 
for example), and can be thought of as an empirical correction to the analytic dependence 
that provides reasonable agreement with simulated results. Inclusion of this factor in our 
analysis of B R (t m j n ), produces notably poorer agreement with simulation. As is evidenced 
by the level of noise in Figure 7, t m i n fluctuates more substantially than the magnitude of 
BR(t m j„), making it a harder variable to use for comparison of the analytic predictions and 
simulated results. In part, this is due to the coarseness of measurement only every 100 gen- 
erations in our simulation. We suggest that, when comparing to experimental sequence 
data, one should use the following empirically correction to the analytic dependence from 
Equation (17) to assess the time scale of maximum response to an experimentally observed 
bottleneck. 



Importantly, this is only meant to be a rough guideline to determine the analytic parameter 
dependence, not an exact expression. 

Appendix IX: Synergistic epistasis and recessive selection 

Although the statistical test described above is motivated by the desire to identify alle- 
les under recessive selection, we note that epistatically interacting alleles may exhibit a 
qualitatively similar response to a population bottleneck and subsequent re-expansion. 
Specifically, in the case of synergistic epistasis between deleterious variants, interacting 
alleles may be rapidly purged post re-expansion after rising to substantial frequency due 
to relaxed selection in the bottleneck. This is precisely the case for compound heterozy- 
gotes: a population bottleneck increases carrier frequency, such that after re-expansion 
multiple variants may occur in the same individual. These individuals are less fit and 
are subsequently purged from the population. In this case, the selection term driving the 
motion of the mutation burden is proportional to - £ v Sy(x;Xy) for alleles with frequency Xi 
and Xj rather than proportional to — s(xf), however the transient dynamics are similar due 




(84) 
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to a strong quadratic dependence of the same sign. In a sense, recessive selection can be in- 
terpreted as a subclass of alleles with synergistic effects limited to interactions between the 
same locus on the sister chromosome. As a result, this statistical test should be interpreted 
as a way to detect non-additive negative selection, since both synergistically interacting 
and recessively interacting alleles provide the same qualitative signature B R > 1. 



Appendix X: Evaluating Gaussian integrals 

The steady state distribution prior to the bottleneck is well approximated by the following 
form. 

e -2N 0 sx 2 

(po * 9 Q — (85) 

x(l - x) 

The decay at large frequencies is made even more rapid by the suppressed terms, so for the 
present argument this form is sufficient. Computing the first moment of this distribution 
corresponds to the following integral. 

2N 0 sx 2 /~1 p -2N 0 sx 2 

(86) 



r-l e -2N 0 sx 2 e -2N t 

(x) 0 ~ 0 O dxx — « 0 O dx — — 

Jo x(l-x) J 0 1- 



X 



For sufficiently large 2NoS » 1, the exponential rapidly converges prior to reaching the 
x = 1 upper limit. In this case, in addition to canceling the linear terms in the numerator and 
denominator, the (1 - x) term in the denominator is highly suppressed by the exponential. 
The first moment can be simply computed as half of a Gaussian integral. 



(x) 0 *d 0 dx e- w ^ * ^ dx e~ w ^ 2 
Jo ^ J_ 00 



90 1 U (87) 



/o <■ j-°o 2^ 2N Q s 

Using the following definition, we can compute the first few moments of interest for the 
site frequency spectrum <ft(x) of recessive deleterious mutations. 



Jo 



(x n+L ) 0 °c dxx n e~v x ={ 



(n-l)H 1 Wl 



(2y)«/2 2 ^ > 



2y<- n+1 )/ 2 

The first few moments are given by the following equations. 



for even n 



for odd n 



(88) 



Jo 



-2N 0 sx 2 _ 00 / 71 0n 



<*>o»e„l (89) 
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f 

Jo 



(x 4 ) 0 * 0 O x 3 e~ 2NoSX 



do 20n 



2(2N 0 s) 2 (4N 0 s) 2 
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Figure 6: Here we plot a curve collapse to compare our analytic description to simulated 
data. Values near B R sim /B R analyhc = 1 validate our analytic description. Deviation from this 
line represents a breakdown in the proposed scaling as a function of the intensity and 
selective effect. We find that the collapse is weakly stratified by selective coefficient, even 
in the range of good agreement at low intensity Large selective coefficients s = 0.1 deviate 
fastest, implying a breakdown in the short bottleneck scaling of B R (s). Parameter values of 
2N B = 2000, T B = {200, 100,50,20}, and s = {0.1,0.02,0.01,0.001} are included on the plot. 
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Figure 7: Here we represent the collapse of the peak time t m i n by plotting ^JfTJ°- 
The collapse, although very rough, appears to cluster in the low intensity regime, with 
relatively quick deviation as the intensity is increased. At larger intensities, the curves 
appear to be approximately stratified by selective effect, with large s = 0.1 deviating the 
fastest. The collapse occurs around t / 1 am yfac = 3, implying reseating by a constant factor. 
This provides an empirical correction to the analytic dependence which has reasonable 
agreement with simulated results at low intensity We emphasize that large fluctuations 
in the peak time make this collapse far less informative than the B^{t m i n ) observable. 
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